Conditional signalling of reference picture list modification information

ABSTRACT

Innovations in signaling of reference picture list (“RPL”) modification information. For example, a video encoder evaluates a condition that depends at least in part on a variable indicating a number of total reference pictures. Depending on the results of the evaluation, the encoder signals in a bitstream a flag that indicates whether an RPL is modified according to syntax elements explicitly signaled in the bitstream. A video decoder evaluates the condition and, depending on results of the evaluation, parses from a bitstream a flag that indicates whether an RPL is modified according to syntax elements explicitly signaled in the bitstream. The condition can be evaluated as part of processing for an RPL modification structure that includes the flag, or as part of processing for a slice header. The encoder and decoder can also evaluate other conditions that affect syntax elements for list entries of the RPL modification information.

CROSS REFERENCE TO RELATED APPLICATION

This application claims the benefit of U.S. Provisional PatentApplication No. 61/708,042, filed Sep. 30, 2012, the disclosure of whichis hereby incorporated by reference.

BACKGROUND

Engineers use compression (also called source coding or source encoding)to reduce the bit rate of digital video. Compression decreases the costof storing and transmitting video information by converting theinformation into a lower bit rate form. Decompression (also calleddecoding) reconstructs a version of the original information from thecompressed form. A “codec” is an encoder/decoder system.

Over the last two decades, various video codec standards have beenadopted, including the H.261, H.262 (MPEG-2 or ISO/IEC 13818-2), H.263and H.264 (AVC or ISO/IEC 14496-10) standards and the MPEG-1 (ISO/IEC11172-2), MPEG-4 Visual (ISO/IEC 14496-2) and SMPTE 421M standards. Morerecently, the HEVC standard is under development. A video codec standardtypically defines options for the syntax of an encoded video bitstream,detailing parameters in the bitstream when particular features are usedin encoding and decoding. In many cases, a video codec standard alsoprovides details about the decoding operations a decoder should performto achieve correct results in decoding. Aside from codec standards,various proprietary codec formats define other options for the syntax ofan encoded video bitstream and corresponding decoding operations.

Some types of parameters in a bitstream indicate information aboutreference pictures used during video encoding and decoding. A referencepicture is, in general, a picture that contains samples that may be usedfor inter-picture prediction in the decoding process of other pictures.Typically, the other pictures follow the reference picture in decodingorder and use the reference picture for motion-compensated prediction.In some video codec standards and formats, multiple reference picturesare available at a given time for use for motion-compensated prediction.Such video codec standards/formats specify how to manage the multiplereference pictures.

In general, a reference picture list (“RPL”) is a list of referencepictures used for motion-compensated prediction. In some video codecstandards and formats, a reference picture set (“RPS”) is a set ofreference pictures available for use in motion-compensated prediction ata given time, and an RPL is some of the reference pictures in the RPS.Reference pictures in an RPL are addressed with reference indices. Areference index identifies a reference picture in the RPL. Duringencoding and decoding, an RPS can be updated to account for newlydecoded pictures and older pictures that are no longer used as referencepictures. Also, reference pictures within an RPL may be reordered suchthat more commonly used reference pictures are identified with referenceindices that are more efficient to signal. In some recent codecstandards, an RPL is constructed during encoding and decoding based uponavailable information about the RPS, modifications according to rulesand/or modifications signaled in the bitstream. Signaling ofmodifications for an RPL can consume a significant amount of bits.

SUMMARY

In summary, the detailed description presents innovations in signalingof reference picture list (“RPL”) modification information. Moregenerally, the innovations relate to different ways to avoid signalingof RPL modification information when it would be unused or when valuesof such information can be inferred.

According to one aspect of the innovations described herein, a videoencoder evaluates a condition. Depending on results of the evaluation,the encoder conditionally signals in a bitstream a flag that indicateswhether an RPL is modified according to syntax elements explicitlysignaled in the bitstream. A corresponding video decoder evaluates acondition. Depending on results of the evaluation, the decoderconditionally parses from a bitstream a flag that indicates whether anRPL is modified according to syntax elements explicitly signaled in thebitstream. In some example implementations, the RPL can be for apredictive (“P”) slice or a bi-predictive (“B”) slice. Alternatively, ahigher level syntax structure is conditionally signaled/parsed based onevaluation of the condition.

In some example implementations, if the RPL is not modified, a defaultRPL is constructed based on rules about RPL construction from an RPS. Ifthe RPL is modified, a replacement RPL is constructed based on signaledRPL modification information that indicates selections of referencepictures from the RPS. Alternatively, modifications to reorder a defaultRPL, add a reference picture to the default RPL or remove a referencepicture from the default RPL are signaled in a more fine-grained way toadjust the default RPL.

For example, the condition that is evaluated depends at least in part ona variable that indicates a number of total reference pictures. In someexample implementations, the condition is whether value of the variableis greater than 1.

The condition can be evaluated as part of processing for an RPLmodification structure that includes the flag. Or, the condition can beevaluated as part of processing for a slice header, in which case theRPL modification structure (including the flag) is conditionallysignaled or parsed depending on results of the evaluation

According to another aspect of the innovations described herein, a videoencoder evaluates another condition. Depending on results of theevaluation, the encoder conditionally signals in a bitstream one or moresyntax elements for list entries that indicate how to modify an RPL(e.g., replace the RPL, adjust the RPL). A corresponding video decoderevaluates the condition. Depending on results of the evaluation, thedecoder conditionally parses from a bitstream one or more syntaxelements for list entries that indicate how to modify an RPL (e.g.,replace the RPL, adjust the RPL). In some example implementations, theRPL can be for a P slice or a B slice (with the condition evaluation andconditional signaling/parsing repeated for each of multiple RPLs for a Bslice). For example, the other condition depends at least in part on avariable that indicates a number of total reference pictures, a numberof active reference pictures for the RPL and/or whether weightedprediction is disabled. Different logic can be used to check whetherweighted prediction is disabled depending on whether a current slice isa P slice or B slice and/or depending on which RPL is beingsignaled/parsed. In some example implementations, if (a) the number oftotal reference pictures is equal to 2 and (b) the number of activereference pictures for the RPL is equal to 1, then the one or moresyntax elements for list entries are absent from the bitstream, and avalue is inferred for one of the list entries. Further, in some exampleimplementations, if (c) the number of total reference pictures is equalto 2, (d) the number of active reference pictures for the RPL is equalto 2 and (e) weighted prediction is disabled, then the one or moresyntax elements for list entries are absent from the bitstream, andvalues are inferred for two of the list entries.

According to another aspect of the innovations described herein, a videoencoder evaluates another condition. Depending on results of theevaluation, the encoder adjusts signaling in a bitstream of one or moresyntax elements for list entries that indicate how to modify an RPL(e.g., replace the RPL, adjust the RPL). In particular, length (in bits)of at least one of the one or more syntax elements is adjusted. Acorresponding video decoder evaluates the condition. Depending onresults of the evaluation, the decoder adjusts parsing from a bitstreamof one or more syntax elements for list entries that indicate how tomodify an RPL (again, where length (in bits) of at least one of the oneor more syntax elements is adjusted). For example, the condition dependsat least in part on whether weighted prediction is disabled. Differentlogic can be used to check whether weighted prediction is disableddepending on whether a current slice is a P slice or B slice and/ordepending on which RPL is being signaled/parsed. In some exampleimplementations, for an index i for the list entries, if weightedprediction is disabled, the length (in bits) of the at least one of thesyntax elements decreases as i increases. For example, in some exampleimplementations, if weighted prediction is disabled, the length of agiven syntax element for list entry[i] is Ceil(Log 2(NumPocTotalCurr-i))bits. On the other hand, if weighted prediction is enabled, the lengthof the given syntax element for list entry[i] is Ceil(Log2(NumPocTotalCurr)) bits.

The encoding or decoding can be implemented as part of a method, as partof a computing device adapted to perform the method or as part of atangible computer-readable media storing computer-executableinstructions for causing a computing device to perform the method.

The foregoing and other objects, features, and advantages of theinvention will become more apparent from the following detaileddescription, which proceeds with reference to the accompanying figures.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a diagram of an example computing system in which somedescribed embodiments can be implemented.

FIGS. 2 a and 2 b are diagrams of example network environments in whichsome described embodiments can be implemented.

FIG. 3 is a diagram of an example encoder system in conjunction withwhich some described embodiments can be implemented.

FIG. 4 is a diagram of an example decoder system in conjunction withwhich some described embodiments can be implemented.

FIG. 5 is a diagram illustrating an example video encoder in conjunctionwith which some described embodiments can be implemented.

FIG. 6 is a diagram illustrating an example video decoder in conjunctionwith which some described embodiments can be implemented.

FIG. 7 a is a table illustrating conditional signaling of a flag thatindicates whether an RPL is modified, according to some exampleimplementations.

FIGS. 7 b and 7 c are tables illustrating conditional signaling of oneor more flags that indicate whether an RPL is modified, according toother example implementations.

FIGS. 8 and 9 are tables illustrating conditional signaling of syntaxelements for list entries that indicate how to modify an RPL, accordingto some example implementations.

FIGS. 10 and 11 are flowcharts illustrating generalized techniques forconditional signaling and parsing, respectively, of a flag thatindicates whether an RPL is modified.

FIGS. 12 and 13 are flowcharts illustrating generalized techniques forconditional signaling and parsing, respectively, of syntax elements forlist entries that indicate how to modify an RPL.

FIGS. 14 and 15 are flowcharts illustrating generalized techniques foradjusting signaling and parsing, respectively, of syntax elements forlist entries that indicate how to modify an RPL.

DETAILED DESCRIPTION

The detailed description presents innovations in signaling of referencepicture list (“RPL”) modification information. These innovations canhelp avoid the signaling of RPL modification information when it wouldbe unused or when values of such information can be inferred.

In some recent codec standards, a reference picture set (“RPS”) is a setof reference pictures available for use in motion-compensatedprediction, and an RPL is constructed from the RPS. For the decodingprocess of a predictive (“P”) slice, there is one RPL, which is calledRPL 0. For the decoding process of a bi-predictive (“B”) slice, thereare two RPLs, which are called RPL 0 and RPL 1. At the beginning of thedecoding process for a P slice, RPL 0 is derived from availableinformation about RPL 0 (such as the set of reference pictures availableat the decoder for decoding of the current picture), modificationsaccording to rules and/or modifications signaled in the bitstream.Similarly, at the beginning of the decoding process for a B slice, RPL 0and RPL 1 are derived from available information about RPL 0 andavailable information about RPL 1 (such as the set of reference picturesavailable at the decoder for decoding of the current picture),modifications according to rules and/or modifications signaled in thebitstream. More generally, an RPL is constructed during encoding anddecoding based upon available information about the RPL, modificationsaccording to rules and/or modifications signaled in the bitstream.Signaling of modifications for an RPL can consume a significant amountof bits. For some recent codec standards, there are inefficiencies inhow RPL modification information is signaled.

The detailed description presents various innovations in the area ofsignaling of RPL modification information. In some situations, theseinnovations result in more efficient signaling of syntax elements forRPL modification information. For example, the detailed descriptiondescribes conditional signaling of syntax elements for list entries thatindicate how to modify an RPL. The detailed description also describesways to use fewer bits to signal such syntax elements. As anotherexample, the detailed description describes conditional signaling of aflag that indicates whether an RPL is modified.

In some example implementations, if the RPL is not modified, a defaultRPL is constructed according to an “implicit” approach using rules aboutRPL construction from an RPS. If the RPL is modified, a replacement RPLis constructed according to an “explicit” signaling approach usingsignaled RPL modification information that indicates selections ofreference pictures from the RPS. Alternatively, modifications toreorder, add a reference picture or remove a reference picture from adefault RPL can be signaled in a more fine-grained way as specificchanges relative to the default RPL.

Some of the innovations described herein are illustrated with referenceto syntax elements and operations specific to the HEVC standard. Forexample, reference is made to the draft version JCTVC-I1003 of the HEVCstandard—“High efficiency video coding (HEVC) text specification draft7”, JCTVC-I1003_d5, 9^(th) meeting of the Joint Collaborative Team onVideo Coding (“JCT-VC”), Geneva, April 2012. See also the draft versionentitled, “High Efficiency Video Coding (HEVC) text specification draft9,” JCTVC-K1003_d11, 11^(th) meeting of the JCT-VC, Shanghai, October2012. The innovations described herein can also be implemented for otherstandards or formats.

More generally, various alternatives to the examples described hereinare possible. For example, some of the methods described herein can bealtered by changing the ordering of the method acts described, bysplitting, repeating, or omitting certain method acts, etc. The variousaspects of the disclosed technology can be used in combination orseparately. Different embodiments use one or more of the describedinnovations. Some of the innovations described herein address one ormore of the problems noted in the background. Typically, a giventechnique/tool does not solve all such problems.

I. Example Computing Systems.

FIG. 1 illustrates a generalized example of a suitable computing system(100) in which several of the described innovations may be implemented.The computing system (100) is not intended to suggest any limitation asto scope of use or functionality, as the innovations may be implementedin diverse general-purpose or special-purpose computing systems.

With reference to FIG. 1, the computing system (100) includes one ormore processing units (110, 115) and memory (120, 125). In FIG. 1, thismost basic configuration (130) is included within a dashed line. Theprocessing units (110, 115) execute computer-executable instructions. Aprocessing unit can be a general-purpose central processing unit(“CPU”), processor in an application-specific integrated circuit(“ASIC”) or any other type of processor. In a multi-processing system,multiple processing units execute computer-executable instructions toincrease processing power. For example, FIG. 1 shows a centralprocessing unit (110) as well as a graphics processing unit orco-processing unit (115). The tangible memory (120, 125) may be volatilememory (e.g., registers, cache, RAM), non-volatile memory (e.g., ROM,EEPROM, flash memory, etc.), or some combination of the two, accessibleby the processing unit(s). The memory (120, 125) stores software (180)implementing one or more innovations for signaling of RPL modificationinformation, in the form of computer-executable instructions suitablefor execution by the processing unit(s).

A computing system may have additional features. For example, thecomputing system (100) includes storage (140), one or more input devices(150), one or more output devices (160), and one or more communicationconnections (170). An interconnection mechanism (not shown) such as abus, controller, or network interconnects the components of thecomputing system (100). Typically, operating system software (not shown)provides an operating environment for other software executing in thecomputing system (100), and coordinates activities of the components ofthe computing system (100).

The tangible storage (140) may be removable or non-removable, andincludes magnetic disks, magnetic tapes or cassettes, CD-ROMs, DVDs, orany other medium which can be used to store information and which can beaccessed within the computing system (100). The storage (140) storesinstructions for the software (180) implementing one or more innovationsfor signaling of RPL modification information.

The input device(s) (150) may be a touch input device such as akeyboard, mouse, pen, or trackball, a voice input device, a scanningdevice, or another device that provides input to the computing system(100). For video encoding, the input device(s) (150) may be a camera,video card, TV tuner card, or similar device that accepts video input inanalog or digital form, or a CD-ROM or CD-RW that reads video samplesinto the computing system (100). The output device(s) (160) may be adisplay, printer, speaker, CD-writer, or another device that providesoutput from the computing system (100).

The communication connection(s) (170) enable communication over acommunication medium to another computing entity. The communicationmedium conveys information such as computer-executable instructions,audio or video input or output, or other data in a modulated datasignal. A modulated data signal is a signal that has one or more of itscharacteristics set or changed in such a manner as to encode informationin the signal. By way of example, and not limitation, communicationmedia can use an electrical, optical, RF, or other carrier.

The innovations can be described in the general context ofcomputer-readable media. Computer-readable media are any availabletangible media that can be accessed within a computing environment. Byway of example, and not limitation, with the computing system (100),computer-readable media include memory (120, 125), storage (140), andcombinations of any of the above.

The innovations can be described in the general context ofcomputer-executable instructions, such as those included in programmodules, being executed in a computing system on a target real orvirtual processor. Generally, program modules include routines,programs, libraries, objects, classes, components, data structures, etc.that perform particular tasks or implement particular abstract datatypes. The functionality of the program modules may be combined or splitbetween program modules as desired in various embodiments.Computer-executable instructions for program modules may be executedwithin a local or distributed computing system.

The terms “system” and “device” are used interchangeably herein. Unlessthe context clearly indicates otherwise, neither term implies anylimitation on a type of computing system or computing device. Ingeneral, a computing system or computing device can be local ordistributed, and can include any combination of special-purpose hardwareand/or general-purpose hardware with software implementing thefunctionality described herein.

The disclosed methods can also be implemented using specializedcomputing hardware configured to perform any of the disclosed methods.For example, the disclosed methods can be implemented by an integratedcircuit (e.g., an application specific integrated circuit (“ASIC”) (suchas an ASIC digital signal process unit (“DSP”), a graphics processingunit (“GPU”), or a programmable logic device (“PLD”), such as a fieldprogrammable gate array (“FPGA”)) specially designed or configured toimplement any of the disclosed methods.

For the sake of presentation, the detailed description uses terms like“determine” and “use” to describe computer operations in a computingsystem. These terms are high-level abstractions for operations performedby a computer, and should not be confused with acts performed by a humanbeing. The actual computer operations corresponding to these terms varydepending on implementation.

II. Example Network Environments.

FIGS. 2 a and 2 b show example network environments (201, 202) thatinclude video encoders (220) and video decoders (270). The encoders(220) and decoders (270) are connected over a network (250) using anappropriate communication protocol. The network (250) can include theInternet or another computer network.

In the network environment (201) shown in FIG. 2 a, each real-timecommunication (“RTC”) tool (210) includes both an encoder (220) and adecoder (270) for bidirectional communication. A given encoder (220) canproduce output compliant with the SMPTE 421M standard, ISO-IEC 14496-10standard (also known as H.264 or AVC), HEVC standard, another standard,or a proprietary format, with a corresponding decoder (270) acceptingencoded data from the encoder (220). The bidirectional communication canbe part of a video conference, video telephone call, or other two-partycommunication scenario. Although the network environment (201) in FIG. 2a includes two real-time communication tools (210), the networkenvironment (201) can instead include three or more real-timecommunication tools (210) that participate in multi-party communication.

A real-time communication tool (210) manages encoding by an encoder(220). FIG. 3 shows an example encoder system (300) that can be includedin the real-time communication tool (210). Alternatively, the real-timecommunication tool (210) uses another encoder system. A real-timecommunication tool (210) also manages decoding by a decoder (270). FIG.4 shows an example decoder system (400), which can be included in thereal-time communication tool (210). Alternatively, the real-timecommunication tool (210) uses another decoder system.

In the network environment (202) shown in FIG. 2 b, an encoding tool(212) includes an encoder (220) that encodes video for delivery tomultiple playback tools (214), which include decoders (270). Theunidirectional communication can be provided for a video surveillancesystem, web camera monitoring system, remote desktop conferencingpresentation or other scenario in which video is encoded and sent fromone location to one or more other locations. Although the networkenvironment (202) in FIG. 2 b includes two playback tools (214), thenetwork environment (202) can include more or fewer playback tools(214). In general, a playback tool (214) communicates with the encodingtool (212) to determine a stream of video for the playback tool (214) toreceive. The playback tool (214) receives the stream, buffers thereceived encoded data for an appropriate period, and begins decoding andplayback.

FIG. 3 shows an example encoder system (300) that can be included in theencoding tool (212). Alternatively, the encoding tool (212) uses anotherencoder system. The encoding tool (212) can also include server-sidecontroller logic for managing connections with one or more playbacktools (214). FIG. 4 shows an example decoder system (400), which can beincluded in the playback tool (214). Alternatively, the playback tool(214) uses another decoder system. A playback tool (214) can alsoinclude client-side controller logic for managing connections with theencoding tool (212).

III. Example Encoder Systems.

FIG. 3 is a block diagram of an example encoder system (300) inconjunction with which some described embodiments may be implemented.The encoder system (300) can be a general-purpose encoding tool capableof operating in any of multiple encoding modes such as a low-latencyencoding mode for real-time communication, transcoding mode, and regularencoding mode for media playback from a file or stream, or it can be aspecial-purpose encoding tool adapted for one such encoding mode. Theencoder system (300) can be implemented as an operating system module,as part of an application library or as a standalone application.Overall, the encoder system (300) receives a sequence of source videoframes (311) from a video source (310) and produces encoded data asoutput to a channel (390). The encoded data output to the channel caninclude syntax elements that indicate RPL modification information.

The video source (310) can be a camera, tuner card, storage media, orother digital video source. The video source (310) produces a sequenceof video frames at a frame rate of, for example, 30 frames per second.As used herein, the term “frame” generally refers to source, coded orreconstructed image data. For progressive video, a frame is aprogressive video frame. For interlaced video, in example embodiments,an interlaced video frame is de-interlaced prior to encoding.Alternatively, two complementary interlaced video fields are encoded asan interlaced video frame or separate fields. Aside from indicating aprogressive video frame, the term “frame” or “picture” can indicate asingle non-paired video field, a complementary pair of video fields, avideo object plane that represents a video object at a given time, or aregion of interest in a larger image. The video object plane or regioncan be part of a larger image that includes multiple objects or regionsof a scene.

An arriving source frame (311) is stored in a source frame temporarymemory storage area (320) that includes multiple frame buffer storageareas (321, 322, . . . , 32 n). A frame buffer (321, 322, etc.) holdsone source frame in the source frame storage area (320). After one ormore of the source frames (311) have been stored in frame buffers (321,322, etc.), a frame selector (330) periodically selects an individualsource frame from the source frame storage area (320). The order inwhich frames are selected by the frame selector (330) for input to theencoder (340) may differ from the order in which the frames are producedby the video source (310), e.g., a frame may be ahead in order, tofacilitate temporally backward prediction. Before the encoder (340), theencoder system (300) can include a pre-processor (not shown) thatperforms pre-processing (e.g., filtering) of the frames before encoding.The pre-processing can also include color space conversion into primaryand secondary components for encoding.

The encoder (340) encodes the selected frame (331) to produce a codedframe (341) and also produces memory management control operation(“MMCO”) signals (342) or reference picture set (“RPS”) information. Ifthe current frame is not the first frame that has been encoded, whenperforming its encoding process, the encoder (340) may use one or morepreviously encoded/decoded frames (369) that have been stored in adecoded frame temporary memory storage area (360). Such stored decodedframes (369) are used as reference pictures for inter-frame predictionof the content of the current source frame (331). Generally, the encoder(340) includes multiple encoding modules that perform encoding taskssuch as motion estimation and compensation, frequency transforms,quantization and entropy coding. The exact operations performed by theencoder (340) can vary depending on compression format. The format ofthe output encoded data can be a Windows Media Video format, VC-1format, MPEG-x format (e.g., MPEG-1, MPEG-2, or MPEG-4), H.26x format(e.g., H.261, H.262, H.263, H.264), HEVC format or other format.

For example, within the encoder (340), an inter-coded, predicted frameis represented in terms of prediction from reference frames, which areexamples of reference pictures. A motion estimator estimates motion ofblocks or other sets of samples of a source frame (341) with respect toone or more reference frames (369). When multiple reference frames areused, the multiple reference frames can be from different temporaldirections or the same temporal direction. The reference frames(reference pictures) can be part of one or more RPLs, with referenceindices addressing the reference pictures in the RPL(s). RPL(s) areconstructed during encoding so that new reference pictures are addedwhen appropriate, older reference pictures that are no longer used formotion compensation are removed when appropriate, and reference picturesare reordered when appropriate. In some implementations, for example,when encoding a current picture, the encoder (340) determines an RPSthat includes reference pictures in the decoded frame storage area(360), then creates one or more RPLs for encoding of a given slice ofthe current picture. An RPL can be created by applying rules about theselection of reference pictures available from the RPS (implicitapproach), in which case RPL modification information is not explicitlysignaled in the bitstream. Or, the RPL can be created by selectingspecific reference pictures available from the RPS, where the referencepictures that are selected will be indicated in RPL modificationinformation that is signaled in the bitstream. Compared to an RPL thatwould be constructed by rules of the implicit approach, the RPLmodification information can specify a replacement RPL as a list ofreference pictures in the RPS. Alternatively, the RPL modificationinformation can, in a more fine-grained way, specify removal of one ormore reference pictures, addition of one or more reference picturesand/or reordering of reference pictures in the RPL constructed by rulesof the implicit approach.

When encoding an inter-coded frame, the encoder (340) can evaluate theresults of motion compensation for which an RPL is not modifiedaccording to syntax elements explicitly signaled in the bitstream, andalso evaluate the results of motion compensation for which the RPL ismodified according to syntax elements explicitly signaled in thebitstream (or results of multiple different ways of modifying the RPL).The encoder (340) can decide to use the default RPL (no RPL modificationinformation signaled in the bitstream) or a modified RPL (with RPLmodification information signaled in the bitstream). When the RPL ismodified (e.g., replaced, adjusted), compared to the default RPL, theencoder (340) can perform one or more of (a) reordering referencepictures for more efficient addressing with reference indices, (b)removing reference pictures based at least in part on frequency of useduring encoding, and (c) adding reference pictures based at least inpart on frequency of use during encoding. For example, the encoder (340)can decide to remove a given reference picture from the RPL afterutilization of the reference picture for motion compensation falls belowa threshold amount and/or according to other criteria. As anotherexample, the encoder (340) can decide to add a given reference pictureto the RPL if utilization of the reference picture for motioncompensation is above a threshold amount and/or according to othercriteria. As another example, the encoder (340) can decide how toreorder reference pictures in the RPL based on frequency of utilizationof the respective reference pictures and/or according to other criteria.

The motion estimator outputs motion information such as motion vectorinformation, which is entropy coded. A motion compensator applies motionvectors to reference pictures to determine motion-compensated predictionvalues. The encoder determines the differences (if any) between ablock's motion-compensated prediction values and corresponding originalvalues. These prediction residual values are further encoded using afrequency transform, quantization and entropy encoding. Similarly, forintra prediction, the encoder (340) can determine intra-predictionvalues for a block, determine prediction residual values, and encode theprediction residual values (with a frequency transform, quantization andentropy encoding). In particular, the entropy coder of the encoder (340)compresses quantized transform coefficient values as well as certainside information (e.g., motion vector information, QP values, modedecisions, parameter choices, reference indices, RPL modificationinformation). Typical entropy coding techniques include Exp-Golombcoding, arithmetic coding, differential coding, Huffman coding, runlength coding, variable-length-to-variable-length (“V2V”) coding,variable-length-to-fixed-length (“V2F”) coding, LZ coding, dictionarycoding, probability interval partitioning entropy coding (“PIPE”), andcombinations of the above. The entropy coder can use different codingtechniques for different kinds of information, and can choose from amongmultiple code tables within a particular coding technique.

The coded frames (341) and MMCO/RPS information (342) are processed by adecoding process emulator (350). The decoding process emulator (350)implements some of the functionality of a decoder, for example, decodingtasks to reconstruct reference pictures that are used by the encoder(340) in motion compensation. The decoding process emulator (350) usesthe MMCO/RPS information (342) to determine whether a given coded frame(341) needs to be reconstructed and stored for use as a referencepicture in inter-frame prediction of subsequent frames to be encoded. Ifthe MMCO/RPS information (342) indicates that a coded frame (341) needsto be stored, the decoding process emulator (350) models the decodingprocess that would be conducted by a decoder that receives the codedframe (341) and produces a corresponding decoded frame (351). In doingso, when the encoder (340) has used decoded frame(s) (369) that havebeen stored in the decoded frame storage area (360), the decodingprocess emulator (350) also uses the decoded frame(s) (369) from thestorage area (360) as part of the decoding process.

The decoded frame temporary memory storage area (360) includes multipleframe buffer storage areas (361, 362, . . . , 36 n). The decodingprocess emulator (350) uses the MMCO/RPS information (342) to manage thecontents of the storage area (360) in order to identify any framebuffers (361, 362, etc.) with frames that are no longer needed by theencoder (340) for use as reference pictures. After modeling the decodingprocess, the decoding process emulator (350) stores a newly decodedframe (351) in a frame buffer (361, 362, etc.) that has been identifiedin this manner.

The coded frames (341) and MMCO/RPS information (342) are also bufferedin a temporary coded data area (370). The coded data that is aggregatedin the coded data area (370) can contain, as part of the syntax of anelementary coded video bitstream, syntax elements that indicate RPLmodification information. The coded data that is aggregated in the codeddata area (370) can also include media metadata relating to the codedvideo data (e.g., as one or more parameters in one or more supplementalenhancement information (“SEI”) messages or video usability information(“VUI”) messages).

The aggregated data (371) from the temporary coded data area (370) areprocessed by a channel encoder (380). The channel encoder (380) canpacketize the aggregated data for transmission as a media stream (e.g.,according to a media container format such as ISO/IEC 14496-12), inwhich case the channel encoder (380) can add syntax elements as part ofthe syntax of the media transmission stream. Or, the channel encoder(380) can organize the aggregated data for storage as a file (e.g.,according to a media container format such as ISO/IEC 14496-12), inwhich case the channel encoder (380) can add syntax elements as part ofthe syntax of the media storage file. Or, more generally, the channelencoder (380) can implement one or more media system multiplexingprotocols or transport protocols, in which case the channel encoder(380) can add syntax elements as part of the syntax of the protocol(s).The channel encoder (380) provides output to a channel (390), whichrepresents storage, a communications connection, or another channel forthe output.

IV. Example Decoder Systems.

FIG. 4 is a block diagram of an example decoder system (400) inconjunction with which some described embodiments may be implemented.The decoder system (400) can be a general-purpose decoding tool capableof operating in any of multiple decoding modes such as a low-latencydecoding mode for real-time communication and regular decoding mode formedia playback from a file or stream, or it can be a special-purposedecoding tool adapted for one such decoding mode. The decoder system(400) can be implemented as an operating system module, as part of anapplication library or as a standalone application. Overall, the decodersystem (400) receives coded data from a channel (410) and producesreconstructed frames as output for an output destination (490). Thecoded data can include syntax elements that indicate RPL modificationinformation.

The decoder system (400) includes a channel (410), which can representstorage, a communications connection, or another channel for coded dataas input. The channel (410) produces coded data that has been channelcoded. A channel decoder (420) can process the coded data. For example,the channel decoder (420) de-packetizes data that has been aggregatedfor transmission as a media stream (e.g., according to a media containerformat such as ISO/IEC 14496-12), in which case the channel decoder(420) can parse syntax elements added as part of the syntax of the mediatransmission stream. Or, the channel decoder (420) separates coded videodata that has been aggregated for storage as a file (e.g., according toa media container format such as ISO/IEC 14496-12), in which case thechannel decoder (420) can parse syntax elements added as part of thesyntax of the media storage file. Or, more generally, the channeldecoder (420) can implement one or more media system demultiplexingprotocols or transport protocols, in which case the channel decoder(420) can parse syntax elements added as part of the syntax of theprotocol(s).

The coded data (421) that is output from the channel decoder (420) isstored in a temporary coded data area (430) until a sufficient quantityof such data has been received. The coded data (421) includes codedframes (431) and MMCO/RPS information (432). The coded data (421) in thecoded data area (430) can contain, as part of the syntax of anelementary coded video bitstream, syntax elements that indicate RPLmodification information. The coded data (421) in the coded data area(430) can also include media metadata relating to the encoded video data(e.g., as one or more parameters in one or more SEI messages or VUImessages). In general, the coded data area (430) temporarily storescoded data (421) until such coded data (421) is used by the decoder(450). At that point, coded data for a coded frame (431) and MMCO/RPSinformation (432) are transferred from the coded data area (430) to thedecoder (450). As decoding continues, new coded data is added to thecoded data area (430) and the oldest coded data remaining in the codeddata area (430) is transferred to the decoder (450).

The decoder (450) periodically decodes a coded frame (431) to produce acorresponding decoded frame (451). As appropriate, when performing itsdecoding process, the decoder (450) may use one or more previouslydecoded frames (469) as reference frames (reference pictures) forinter-frame prediction. The decoder (450) reads such previously decodedframes (469) from a decoded frame temporary memory storage area (460).Generally, the decoder (450) includes multiple decoding modules thatperform decoding tasks such as entropy decoding, inverse quantization,inverse frequency transforms and motion compensation (which can createRPL(s) using RPL modification information). The exact operationsperformed by the decoder (450) can vary depending on compression format.

For example, the decoder (450) receives encoded data for a compressedframe or sequence of frames and produces output including decoded frame(451). In the decoder (450), a buffer receives encoded data for acompressed frame and makes the received encoded data available to anentropy decoder. The entropy decoder entropy decodes entropy-codedquantized data as well as entropy-coded side information (includingreference indices, RPL modification information, etc.), typicallyapplying the inverse of entropy encoding performed in the encoder. Thedecoder constructs one or more RPLs for reference pictures, withreference indices addressing the reference pictures in the RPL(s). TheRPL(s) are constructed so that new reference pictures are added whenappropriate, older reference pictures that are no longer used for motioncompensation are removed when appropriate, and reference pictures arereordered when appropriate. In some implementations, for example, whendecoding a current picture, the decoder (450) determines an RPS thatincludes reference pictures in the decoded frame storage area (460),then creates one or more RPLs for decoding of a given slice of thecurrent picture. An RPL can be created by applying rules about theselection of reference pictures available from the RPS, in which caseRPL modification information is not parsed from the bitstream. Or, theRPL can be created by selecting specific reference pictures availablefrom the RPS, where the reference pictures that are selected areindicated in RPL modification information that is parsed from thebitstream. Compared to an RPL that would be constructed by rules of theimplicit approach, the RPL modification information can specify areplacement RPL as a list of reference pictures in the RPS.Alternatively, the RPL modification information can, in a morefine-grained way, specify removal of one or more reference pictures,addition of one or more reference pictures and/or reordering ofreference pictures in the RPL constructed by rules of the implicitapproach.

A motion compensator applies motion information to one or more referencepictures to form motion-compensated predictions of sub-blocks and/orblocks (generally, blocks) of the frame being reconstructed. An intraprediction module can spatially predict sample values of a current blockfrom neighboring, previously reconstructed sample values. The decoder(450) also reconstructs prediction residuals. An inverse quantizerinverse quantizes entropy-decoded data. An inverse frequency transformerconverts the reconstructed frequency domain data into spatial domaininformation. For a predicted frame, the decoder (450) combinesreconstructed prediction residuals with motion-compensated predictionsto form a reconstructed frame. The decoder (450) can similarly combineprediction residuals with spatial predictions from intra prediction. Amotion compensation loop in the video decoder (450) includes an adaptivede-blocking filter to smooth discontinuities across block boundary rowsand/or columns in the decoded frame (451).

The decoded frame temporary memory storage area (460) includes multipleframe buffer storage areas (461, 462, . . . , 46 n). The decoded framestorage area (460) is an example of a DPB. The decoder (450) uses theMMCO/RPS information (432) to identify a frame buffer (461, 462, etc.)in which it can store a decoded frame (451). The decoder (450) storesthe decoded frame (451) in that frame buffer.

An output sequencer (480) uses the MMCO/RPS information (432) toidentify when the next frame to be produced in output order is availablein the decoded frame storage area (460). When the next frame (481) to beproduced in output order is available in the decoded frame storage area(460), it is read by the output sequencer (480) and output to the outputdestination (490) (e.g., display). In general, the order in which framesare output from the decoded frame storage area (460) by the outputsequencer (480) may differ from the order in which the frames aredecoded by the decoder (450).

V. Example Video Encoders.

FIG. 5 is a block diagram of a generalized video encoder (500) inconjunction with which some described embodiments may be implemented.The encoder (500) receives a sequence of video frames including acurrent frame (505) and produces encoded data (595) as output.

The encoder (500) is block-based and uses a block format that depends onimplementation. Blocks may be further sub-divided at different stages,e.g., at the frequency transform and entropy encoding stages. Forexample, a frame can be divided into 64×64 blocks, 32×32 blocks or 16×16blocks, which can in turn be divided into smaller blocks and sub-blocksof pixel values for coding and decoding.

The encoder system (500) compresses predicted frames and intra-codedframes. For the sake of presentation, FIG. 5 shows an “intra path”through the encoder (500) for intra-frame coding and an “inter path” forinter-frame coding. Many of the components of the encoder (500) are usedfor both intra-frame coding and inter-frame coding. The exact operationsperformed by those components can vary depending on the type ofinformation being compressed.

If the current frame (505) is a predicted frame, a motion estimator(510) estimates motion of blocks, sub-blocks or other sets of pixelvalues of the current frame (505) with respect to one or more referenceframes (reference pictures). The frame store (520) buffers one or morereconstructed previous frames (525) for use as reference frames(reference pictures). When multiple reference pictures are used, themultiple reference pictures can be from different temporal directions orthe same temporal direction. The multiple reference pictures can berepresented in one or more RPLs, which are addressed with referenceindices. The motion estimator (510) outputs as side information motioninformation (515) such as differential motion vector information,reference indices and RPL modification information. During encoding, theencoder (500) constructs RPL(s) so that new reference pictures are addedwhen appropriate, older reference pictures that are no longer used formotion compensation are removed when appropriate, and reference picturesare reordered in the RPL(s) when appropriate.

In some implementations, when encoding a current frame, the encoder(500) determines an RPS that includes reference frames in the framestore (520). The encoder (500) typically determines the RPS for thefirst slice of the frame. On a slice-by-slice basis, the encoder (500)creates one or more RPLs for encoding of a given slice of the currentframe. To create an RPL, the encoder (500) can apply rules about theselection of reference frames available from the RPS, in which case RPLmodification information is not explicitly signaled in the encoded data(595). Or, to create the RPL, the encoder (500) can select specificreference frames available from the RPS, where the reference frames thatare selected will be indicated in RPL modification information that issignaled in the encoded data (595). Compared to an RPL that would beconstructed by rules of the implicit approach, the RPL modificationinformation can specify a replacement RPL as a list of referencepictures in the RPS. Alternatively, the RPL modification informationcan, in a more fine-grained way, specify removal of one or morereference frames, addition of one or more reference frames and/orreordering of reference frames in the RPL implicitly constructed byrules.

The motion compensator (530) applies reconstructed motion vectors to thereconstructed reference frame(s) (525) when forming a motion-compensatedcurrent frame (535). The difference (if any) between a sub-block, block,etc. of the motion-compensated current frame (535) and correspondingpart of the original current frame (505) is the prediction residual(545) for the sub-block, block, etc. During later reconstruction of thecurrent frame, reconstructed prediction residuals are added to themotion-compensated current frame (535) to obtain a reconstructed framethat is closer to the original current frame (505). In lossycompression, however, some information is still lost from the originalcurrent frame (505). The intra path can include an intra predictionmodule (not shown) that spatially predicts pixel values of a currentblock or sub-block from neighboring, previously reconstructed pixelvalues.

A frequency transformer (560) converts spatial domain video informationinto frequency domain (i.e., spectral, transform) data. For block-basedvideo frames, the frequency transformer (560) applies a discrete cosinetransform, an integer approximation thereof, or another type of forwardblock transform to blocks or sub-blocks of pixel value data orprediction residual data, producing blocks/sub-blocks of frequencytransform coefficients. A quantizer (570) then quantizes the transformcoefficients. For example, the quantizer (570) applies non-uniform,scalar quantization to the frequency domain data with a step size thatvaries on a frame-by-frame basis, slice-by-slice basis, block-by-blockbasis or other basis.

When a reconstructed version of the current frame is needed forsubsequent motion estimation/compensation, an inverse quantizer (576)performs inverse quantization on the quantized frequency coefficientdata. An inverse frequency transformer (566) performs an inversefrequency transform, producing blocks/sub-blocks of reconstructedprediction residuals or pixel values. For a predicted frame, the encoder(500) combines reconstructed prediction residuals (545) withmotion-compensated predictions (535) to form the reconstructed frame(505), which may be used as a reference picture. (Although not shown inFIG. 5, in the intra path, the encoder (500) can combine predictionresiduals with spatial predictions from intra prediction to reconstructa frame that is used as a reference picture.) The frame store (520)buffers the reconstructed current frame for use as a reference picturein subsequent motion-compensated prediction.

A motion compensation loop in the encoder (500) includes an adaptivein-loop deblock filter (510) before or after the frame store (520). Thedecoder (500) applies in-loop filtering to reconstructed frames toadaptively smooth discontinuities across boundaries in the frames.

The entropy coder (580) compresses the output of the quantizer (570) aswell as motion information (515) and certain side information (e.g., QPvalues, reference indices, RPL modification information). The entropycoder (580) provides encoded data (595) to the buffer (590), whichmultiplexes the encoded data into an output bitstream. The encoded data(595) can include syntax elements that indicate RPL modificationinformation. Section VII describes examples of such syntax elements.

A controller (not shown) receives inputs from various modules of theencoder. The controller evaluates intermediate results during encoding,for example, setting QP values and performing rate-distortion analysis.The controller works with other modules to set and change codingparameters during encoding. In particular, when deciding whether and howto modify (e.g., replace, adjust) RPL(s), the controller can controlwhich reference pictures are added to RPL(s), control which picture areremoved from RPL(s), and reorder reference pictures in RPL(s) for moreefficient addressing with reference indices. The controller can decideto remove reference pictures from the RPS (and hence RPLs), for example,by removing all reference pictures after a scene change, removing allreference pictures after encoding of a special kind of picture such asan IDR picture, removing a given reference picture after utilization ofthe reference picture for motion compensation falls below a thresholdamount and/or removing reference pictures according to other criteria.The controller can decide to add reference pictures to the RPS, forexample, by adding pictures according to picture type/slice types in thepictures, temporal layer for the pictures and/or other criteria. For anRPL, the controller can evaluate the results of motion compensation forwhich an RPL is not modified according to syntax elements explicitlysignaled in the bitstream, and also evaluate the results of motioncompensation for which the RPL is modified according to syntax elementsexplicitly signaled in the bitstream (or results of multiple differentways of modifying the RPL). The controller can evaluate results in termsof bitrate and/or quality. The controller can select the RPL implicitlyconstructed by rules (no RPL modification information) or select an RPLthat has been modified (as specified with RPL modification information).To modify (e.g., replace, adjust) an RPL, compared to the implicitlyconstructed RPL, the controller can (a) reorder reference pictures formore efficient addressing with reference indices, (b) remove referencepictures based at least in part on frequency of use during encoding,and/or (c) add reference pictures based at least in part on frequency ofuse during encoding. For example, the controller can decide to remove agiven reference picture from the RPL after utilization of the referencepicture for motion compensation falls below a threshold amount and/oraccording to other criteria. Or, the controller can decide to add agiven reference picture to the RPL if utilization of the referencepicture for motion compensation is above a threshold amount and/oraccording to other criteria. Or, the controller can decide how toreorder reference pictures in the RPL based on frequency of utilizationof the respective reference pictures and/or according to other criteria.The controller can construct the RPL(s) on a picture-by-picture basis,slice-by-slice basis, or some other basis.

Depending on implementation and the type of compression desired, modulesof the encoder can be added, omitted, split into multiple modules,combined with other modules, and/or replaced with like modules. Inalternative embodiments, encoders with different modules and/or otherconfigurations of modules perform one or more of the describedtechniques. Specific embodiments of encoders typically use a variationor supplemented version of the encoder (500). The relationships shownbetween modules within the encoder (500) indicate general flows ofinformation in the encoder; other relationships are not shown for thesake of simplicity.

VI. Example Video Decoders.

FIG. 6 is a block diagram of a generalized decoder (600) in conjunctionwith which several described embodiments may be implemented. The decoder(600) receives encoded data (695) for a compressed frame or sequence offrames and produces output including a reconstructed frame (605), whichmay be used as a reference picture. For the sake of presentation, FIG. 6shows an “intra path” through the decoder (600) for intra-frame decodingand an “inter path” for inter-frame decoding. Many of the components ofthe decoder (600) are used for both intra-frame decoding and inter-framedecoding. The exact operations performed by those components can varydepending on the type of information being decompressed.

A buffer (690) receives encoded data (695) for a compressed frame andmakes the received encoded data available to the parser/entropy decoder(680). The encoded data (695) can include syntax elements that indicateRPL modification information. Section VII describes examples of suchsyntax elements. The parser/entropy decoder (680) entropy decodesentropy-coded quantized data as well as entropy-coded side information(including reference indices, RPL modification information, etc.),typically applying the inverse of entropy encoding performed in theencoder.

During decoding, the decoder (600) constructs RPL(s) so that newreference pictures are added when appropriate, older reference picturesthat are no longer used for motion compensation are removed whenappropriate, and reference pictures are reordered when appropriate. Thedecoder (600) can construct the RPL(s) based upon available informationabout the RPL(s) (e.g., available reference pictures in the RPS),modifications according to rules and/or according to modificationssignaled as part of the encoded data (695). In some implementations, forexample, when decoding a current frame, the decoder (600) determines anRPS that includes reference frames in the frame store (620). The decoder(600) typically determines the RPS for the first slice of the frame. Ona slice-by-slice basis, the decoder (600) creates one or more RPLs fordecoding of a given slice of the current frame. To create an RPL, insome cases (as indicated in the encoded data (695)), the decoder (600)applies rules about the selection of reference frames available from theRPS, in which case RPL modification information is not parsed from theencoded data (695). In other cases, to create the RPL, the decoder (600)selects specific reference frames available from the RPS, where thereference frames that are selected are indicated in RPL modificationinformation that is parsed from the encoded data (695). The RPLmodification information can specify a replacement RPL as a list ofreference pictures in the RPS. Alternatively, the RPL modificationinformation can, in a more fine-grained way, specify removal of one ormore reference frames, addition of one or more reference frames and/orreordering of reference frames in the RPL implicitly constructed byrules.

A motion compensator (630) applies motion information (615) to one ormore reference pictures (625) to form motion-compensated predictions(635) of sub-blocks and/or blocks of the frame (605) beingreconstructed. The frame store (620) stores one or more previouslyreconstructed frames for use as reference pictures.

The intra path can include an intra prediction module (not shown) thatspatially predicts pixel values of a current block or sub-block fromneighboring, previously reconstructed pixel values. In the inter path,the decoder (600) reconstructs prediction residuals. An inversequantizer (670) inverse quantizes entropy-decoded data. An inversefrequency transformer (660) converts the reconstructed frequency domaindata into spatial domain information. For example, the inverse frequencytransformer (660) applies an inverse block transform to frequencytransform coefficients, producing pixel value data or predictionresidual data. The inverse frequency transform can be an inversediscrete cosine transform, an integer approximation thereof, or anothertype of inverse frequency transform.

For a predicted frame, the decoder (600) combines reconstructedprediction residuals (645) with motion-compensated predictions (635) toform the reconstructed frame (605), which may be used as a referencepicture. (Although not shown in FIG. 6, in the intra path, the decoder(600) can combine prediction residuals with spatial predictions fromintra prediction to reconstruct a frame, which may be used as areference picture.) A motion compensation loop in the decoder (600)includes an adaptive in-loop deblock filter (610) before or after theframe store (620). The decoder (600) applies in-loop filtering toreconstructed frames to adaptively smooth discontinuities acrossboundaries in the frames.

In FIG. 6, the decoder (600) also includes a post-processing deblockfilter (608). The post-processing deblock filter (608) optionallysmoothes discontinuities in reconstructed frames. Other filtering (suchas de-ring filtering) can also be applied as part of the post-processingfiltering.

Depending on implementation and the type of decompression desired,modules of the decoder can be added, omitted, split into multiplemodules, combined with other modules, and/or replaced with like modules.In alternative embodiments, decoders with different modules and/or otherconfigurations of modules perform one or more of the describedtechniques. Specific embodiments of decoders typically use a variationor supplemented version of the decoder (600). The relationships shownbetween modules within the decoder (600) indicate general flows ofinformation in the decoder; other relationships are not shown for thesake of simplicity.

VII. Signaling of Reference Picture List Modification Information.

This section presents various innovations in the area of signaling ofRPL modification information. In some situations, these innovationsresult in more efficient signaling of syntax elements for RPLmodification information.

A. Reference Pictures and RPLs.

A reference picture is, in general, a picture that contains samples thatmay be used for inter-picture prediction in the decoding process ofother pictures, which typically follow the reference picture in decodingorder. Multiple reference pictures may be available at a given time foruse for motion-compensated prediction.

In general, a reference picture list (“RPL”) is a list of referencepictures used for motion-compensated prediction. Reference pictures inthe RPL are addressed with reference indices. A reference indexidentifies a reference picture in the RPL. During encoding and decoding,when an RPL is constructed, reference pictures in the RPL can changefrom time to time to add newly decoded pictures, drop older picturesthat are no longer used as reference pictures and/or reorder referencepictures within the RPL to make signaling of the more commonly usedreference indices more efficient. An encoder and decoder can follow thesame rules to construct, modify, etc. their RPL(s). In addition to suchrules (or instead of such rules), an encoder can signal information to adecoder that indicates how the decoder should construct, modify, etc.its RPL(s) to match the RPL(s) used by the encoder. Typically, an RPL isconstructed during encoding and decoding based upon availableinformation about the RPL (e.g., available pictures in the RPS),modifications according to rules and/or modifications signaled in thebitstream.

In some implementations, for a current picture, an encoder or decoderdetermines a reference picture set (“RPS”) that includes referencepictures in a decoded frame storage area such as a decoded picturebuffer (“DPB”). The RPS is a description of the reference pictures usedin the decoding process of the current and future coded pictures.Reference pictures included in the RPS are listed explicitly in thebitstream.

The encoder or decoder determines the RPS once per picture. For example,the decoder determines the RPS after decoding a slice header for a sliceof the picture, using syntax elements signaled in the slice header.Reference pictures are identified with picture order count (“POC”)values, parts thereof and/or other information signaled in thebitstream. The encoder or decoder determines groups of short-termreference pictures and long-term reference pictures that may be used ininter-picture prediction of the current picture (and that may be used ininter-picture prediction of one or more of the pictures following thecurrent picture in decoding order). (The encoder or decoder alsodetermines groups of reference pictures that may be used ininter-picture prediction of one or more of the pictures following thecurrent picture in decoding order, but are not used for the currentpicture.) Collectively, the groups of reference pictures are the RPS forthe current picture.

For a given slice of the current picture, the encoder or decoder createsone or more RPLs. The encoder or decoder creates a temporary version ofan RPL (e.g., RPL 0 or RPL 1) by combining the groups of short-termreference pictures and long-term reference pictures that may be used ininter-picture prediction of the current picture. To construct the RPLaccording to rules of an “implicit” approach, the encoder or decoder canuse the reference pictures in the temporary version of the RPL, or useonly some of the reference pictures in the temporary version of the RPL(e.g., the first x pictures in the temporary version of the RPL). Forthe “implicit” approach, RPL modification information will not besignaled in the bitstream, and is not parsed from the bitstream. In an“explicit” approach, to construct the RPL, the encoder or decoder usesRPL modification information signaled in/parsed from the bitstream toselect specific reference pictures from the temporary version of theRPL. Compared to the RPL that would be constructed by rules of the“implicit” approach, the RPL modification information can specifyremoval of one or more reference pictures, addition of one or morereference pictures and/or reordering of reference pictures in the RPL.

Alternatively, an encoder or decoder uses another approach to creatingan RPL from reference pictures.

B. Conditional Signaling of RPL Modification Flags.

According to one aspect of the innovations described herein, an encoderconditionally signals a flag that indicates whether an RPL is modifiedaccording to syntax elements explicitly signaled in the bitstream. Acorresponding decoder conditionally parses such a flag.

In some example implementations, the flag isref_pic_list_modification_flag_(—)10 orref_pic_list_modification_flag_(—)11 (generally, the flag isref_pic_list_modification_flag_(—)1X, where X can be 0 or 1). If thevalue of the flag ref_pic_list_modification_flag_(—)1X is equal to 1,the RPL X is specified explicitly as a list of list_entry_(—)1X[i]values (again, with X being 0 or 1). If the value of the flagref_pic_list_modification_flag_(—)1X is equal to 0, the RPL X isdetermined implicitly. When ref_pic_list_modification_flag_(—)1X is notpresent, it is inferred to be equal to 0.

FIG. 7 a shows example syntax (700) for a ref_pic_lists_modification( )syntax structure in example implementations. The structure may besignaled as part of a slice header. In the example syntax (700),ref_pic_list_modification_flag_(—)1X is only sent when NumPocTotalCurris greater than 1. NumPocTotalCurr is a variable that indicates a totalnumber of reference pictures applicable for current encoding ordecoding. In example implementations of encoding or decoding, when thevariable NumPocTotalCurr is derived for a slice of a current picture,the variable indicates the count of short-term reference pictures andlong-term reference pictures used as reference pictures for encoding ordecoding of the current picture.

As shown in FIG. 7 a, the conditional signaling ofref_pic_list_modification_flag_(—)1X depends on the value of thevariable NumPocTotalCurr. When NumPocTotalCurr is less than or equal to1, there is no possibility for modification of the RPL, and hence noneed to send the flag ref_pic_list_modification_flag_(—)1X. Thisconditional signaling can save one or two flags for every slice, whenthe condition is fulfilled. The modification in FIG. 7 a includes thecondition “if(NumPocTotalCurr>1)” for whether the flagref_pic_list_modification_flag_(—)1X is signaled. The condition can bechecked for list 0 (for a P slice or B slice) and/or for list 1 (for a Bslice).

Alternatively, the signaling and parsing of an RPL modificationstructure including one or more RPL modification flags (e.g., aref_pic_lists_modification( ) structure) can be controlled by evaluatinga condition as part of slice header processing or otherwise. FIG. 7 billustrates an approach to conditional signaling and parsing of flagsref_pic_list_modification_flag_(—)10 andref_pic_list_modification_flag_(—)11 based on this condition.Specifically, FIG. 7 b shows example syntax (750) for a slice headersyntax structure that may include a ref_pic_lists_modification( ) syntaxstructure, which is depicted in the syntax (760) of FIG. 7 c. For theexample syntax (750) of the slice header, the flaglists_modification_present_flag is signaled in a picture parameter setthat applies for the slice. When lists_modification_present_flag equals0, the structure ref_pic_lists_modification( ) is not present in theslice header. When lists_modification_present_flag equals 1, thestructure ref_pic_lists_modification( ) may be present in the sliceheader, depending on the value of the variable NumPocTotalCurr. If thevariable NumPocTotalCurr is greater than 1, then theref_pic_lists_modification( ) structure is signaled, as shown in thesyntax (760) of FIG. 7 c. Otherwise (the variable NumPocTotalCurr is notgreater than 1), the ref_pic_lists_modification( ) structure is notsignaled, and the values of list entries are inferred.

In FIGS. 7 a-7 c, 8 and 9, the term “u(n)” represents an unsignedinteger using n bits. When n is “v” (as in “u(v)”), the number of bitsvaries in a manner dependent on the value of other syntax elements. Theparsing process for u(n) can be specified by the return value of afunction that reads n bits as a binary representation of an unsignedinteger, with most significant bit written first.

C. Signaling of Syntax Elements for List Entries.

According to another aspect of the innovations described herein, anencoder conditionally signals syntax elements for list entries thatindicate how to modify an RPL. A corresponding decoder conditionallyparses such syntax elements.

In some example implementations, the syntax elements are forlist_entry_(—)10[i] syntax elements for RPL 0 or list_entry_(—)11[i]syntax elements for RPL 1 (generally, the syntax element islist_entry_(—)1X, where X can be 0 or 1). FIG. 8 shows example syntax(800) for a ref_pic_lists_modification( ) syntax structure, which may besignaled as part of a slice header. In the example syntax (800), thesyntax element list_entry_(—)1X[0] is conditionally signaled in thebitstream. In particular, when NumPocTotalCurr is equal to 2 andnum_ref_idx_(—)1X_active_minus1 is equal to 0, the syntax elementlist_entry_(—)1X[0] is not signaled in the bitstream. The variablenum_ref_idx_(—)1X_active_minus1 indicates the maximum reference indexfor the RPL X that may be used to decode a slice. Thenum_ref_idx_(—)1X_active_minus1 variable can have a default value (e.g.,a value from 0 . . . 15, as specified in the applicable pictureparameter set), or num_ref_idx_(—)1X_active_minus1 can have a valuesignaled in a slice header for the current slice.

As shown in FIG. 8, even when ref_pic_list_modification_flag_(—)1Xindicates RPL modification information is signaled in the bitstream, thesignaling of list_entry_(—)1X[0]depends on NumPocTotalCurr andnum_ref_idx_(—)1X_active_minus1. When NumPocTotalCurr is equal to 2 andnum_ref_idx_(—)1X_active_minus1 is equal to 1, the value oflist_entry_(—)1X[0] can be inferred based onref_pic_list_modification_flag_(—)1X, since there are only two choicespossible (default value of 0 or the non-default value of 1).

Thus, FIG. 8 includes a condition for whether syntax elements for listentries are signaled. For RPL 0, the condition is “if(ref_pic_list_modification_flag_(—)10 && !(NumPocTotalCurr==2 &&num_ref_idx_(—)10_active_minus1==0)).” For RPL 1, the condition is “if(ref_pic_list_modification_flag_(—)11 && !(NumPocTotalCurr==2 &&num_ref_idx_(—)11_active_minus1==0)).”

In the example of FIG. 8, list_entry_(—)10[i] specifies the index of thereference picture in RefPicListTemp0 (a temporary version of RPL) to beplaced at the current position of RPL 0. The length of thelist_entry_(—)10[i] syntax element is Ceil(Log 2(NumPocTotalCurr)) bits.The value of list_entry_(—)10[i] is in the range of 0 toNumPocTotalCurr−1, inclusive. If NumPocTotalCurr is equal to 2 andnum_ref_idx_(—)10_active_minus1 is equal to 0, the syntax elementlist_entry_(—)10[0] is inferred to be equal toref_pic_list_modification_flag_(—)10. Otherwise, when the syntax elementlist_entry_(—)10[i] is not present, it is inferred to be equal to 0.

In the example of FIG. 8, list_entry_(—)11[i] specifies the index of thereference picture in RefPicListTemp1 (a temporary version of RPL) to beplaced at the current position of RPL 1. The length of thelist_entry_(—)11[i] syntax element is Ceil(Log 2(NumPocTotalCurr)) bits.The value of list_entry_(—)11[i] is in the range of 0 toNumPocTotalCurr−1, inclusive. If NumPocTotalCurr is equal to 2 andnum_ref_idx_(—)11_active_minus1 is equal to 0, the syntax elementlist_entry_(—)11 [0] is inferred to be equal toref_pic_list_modification_flag_(—)11. Otherwise, when the syntax elementlist_entry_(—)11[i] is not present, it is inferred to be equal to 0.

FIG. 9 shows another example syntax (900) for aref_pic_lists_modification( ) syntax structure, which may be signaled aspart of a slice header. In the example syntax (900), the syntax elementlist_entry_(—)1X[0] is conditionally signaled in the bitstream. Comparedto the example syntax of FIG. 8, however, the condition that is checkedis different. Also, the signaling of syntax elements forlist_entry_(—)1X[ ] may be adjusted depending on whether weightedprediction is used.

According to FIG. 9, whether weighted prediction is enabled or disabledaffects how syntax elements for list entries are signaled in thebitstream. For P slices with weighted_pred_flag equal to 0 or for Bslices with weighted_bipred_flag equal to 0, weighted prediction isdisabled. According to the example syntax (900) of FIG. 9, when weightedprediction is disabled, list_entry_(—)1X[0] and list_entry_(—)1X[1] arenot be sent when NumPocTotalCurr is equal to 2 andnum_ref_idx_(—)1X_active_minus1 is equal to 1. In such a case,list_entry_(—)1X[0] and list_entry_(—)1X[1] are inferred to be 1 and 0,respectively, since RPL modification would not have been needed for theonly other possibility (that is, list_entry_(—)1X[0] andlist_entry_(—)1X[1] being equal to 0 and 1, respectively).

Thus, FIG. 9 includes a condition for whether syntax elements for listentries are signaled. For RPL 0, the condition is “if(ref_pic_list_modification_flag_(—)10 && !(NumPocTotalCurr==2 &&num_ref_idx_(—)10_active_minus1==0) && !(NumPocTotalCurr==2 &&num_ref_idx_(—)10_active_minus1==1 && ((weighted_pred_flag !=1 &&slice_type==P)∥(weighted_bipred_flag !=1 && slice_type==B)))).” For RPL1, the condition is “if (ref_pic_list_modification_flag_(—)11 &&!(NumPocTotalCurr==2 && num_ref_idx_(—)11_active_minus1==0) &&!(NumPocTotalCurr==2 && num_ref_idx_(—)11_active_minus1==1 &&weighted_bipred_flag !=1)).”

Furthermore, even in cases in which NumPocTotalCurr is not equal to 2 ornum_ref_idx_(—)1X_active_minus1 is not equal to 1, when weightedprediction is disabled (for P slices, weighted_pred_flag equal to 0; forB slices, weighted_bipred_flag equal to 0), the length oflist_entry_(—)1X[i] syntax element is limited to Ceil(Log2(NumPocTotalCurr−i)) bits. In this case, it is only useful to placeeach reference picture once in the list, and thus the number of usefulpossibilities decreases as the index i increases.

In the example of FIG. 9, list_entry_(—)10[i] specifies the index of thereference picture in RefPicListTemp0 (a temporary version of RPL) to beplaced at the current position of RPL 0. When weighted prediction isdisabled (for P slices, weighted_pred_flag equal to 0; for B slices,weighted_bipred_flag equal to 0), the length of list_entry_(—)10[i]syntax element is Ceil(Log 2(NumPocTotalCurr−i)) bits. Otherwise, thelength of the list_entry_(—)10[i] syntax element is Ceil(Log2(NumPocTotalCurr)) bits. If NumPocTotalCurr is equal to 2 andnum_ref_idx_(—)10_active_minus1 is equal to 0, the syntax elementlist_entry_(—)10[0] is inferred to be equal toref_pic_list_modification_flag_(—)10 (as in the example of FIG. 8).Otherwise, if NumPocTotalCurr is equal to 2,num_ref_idx_(—)10_active_minus1 is equal to 1 and weighted prediction isdisabled (when weighted_pred_flag is equal to 0 and the current slice isa P slice, or weighted_bipred_flag is equal to 0 and the current sliceis a B slice), the syntax elements list_entry_(—)10[0] andlist_entry_(—)10[1] are inferred to be equal to 1 and 0 respectively.Otherwise, when the syntax element list_entry_(—)10[i] is not present,it is inferred to be equal to 0.

If weighted prediction is disabled (when weighted_pred_flag is equal to0 and the current slice is a P slice, or weighted_bipred_flag is equalto 0 and the current slice is a B slice) the value oflist_entry_(—)10[i] is in the range of 0 to NumPocTotalCurr−(i+1),inclusive, and the list RefPicListTemp0 is shortened by removal of eachentry list_entry_(—)10[i] from the list RefPicListTemp0 after the entryvalue is parsed. Otherwise, the value of list_entry_(—)10[i] is in therange of 0 to NumPocTotalCurr−1, inclusive.

In the example of FIG. 9, list_entry_(—)11[i] specifies the index of thereference picture in RefPicListTemp1 (a temporary version of RPL) to beplaced at the current position of RPL 1. If weighted prediction isdisabled (weighted_bipred_flag is equal to 0, since only a B slice useslist 1), the length of list_entry_(—)11[i] syntax element is Ceil(Log2(NumPocTotalCurr−i)) bits. Otherwise, the length of thelist_entry_(—)11[i] syntax element is Ceil(Log 2(NumPocTotalCurr)) bits.If NumPocTotalCurr is equal to 2 and num_ref_idx_(—)11_active_minus1 isequal to 0, the syntax element list_entry_(—)11 [0] is inferred to beequal to ref_pic_list_modification_flag_(—)11 (as in the example of FIG.8). Otherwise, if NumPocTotalCurr is equal to 2,num_ref_idx_(—)11_active_minus1 is equal to 1 and weighted prediction isdisabled (weighted_bipred_flag is equal to 0—the current slice is a Bslice), the syntax elements list_entry_(—)11[0] and list_entry_(—)11[1]are inferred to be equal to 1 and 0 respectively. Otherwise, when thesyntax element list_entry_(—)11[i] is not present, it is inferred to beequal to 0.

If weighted prediction is disabled (weighted_bipred_flag is equal to0—the current slice is a B slice), the value of list_entry_(—)11[i] isin the range of 0 to NumPocTotalCurr−(i+1), inclusive, and the listRefPicListTemp1 is shortened by removal of each entry list_entry_(—)11[i] from the list RefPicListTemp1 after the entry value is parsed.Otherwise, the value of list_entry_(—)10[i] is in the range of 0 toNumPocTotalCurr−1, inclusive.

D. Generalized Techniques for Conditional Signaling and Parsing of RPLModification Flags.

FIG. 10 shows a generalized technique (1000) for conditional signalingof an RPL modification flag. A computing device that implements a videoencoder, for example as described with reference to FIG. 3, can performthe technique (1000).

The device evaluates (1010) a condition. For example, the conditiondepends at least in part on a variable that indicates a number of totalreference pictures. In some example implementations, the variable isNumPocTotalCurr, and the encoder checks whether the variable is greaterthan 1. Alternatively, the encoder evaluates other and/or additionalconditions. The condition that is evaluated (1010) can include a singlefactor (e.g., value of variable that indicates a number of totalreference pictures), or the condition that is evaluated (1010) caninclude multiple factors (e.g., value of variable that indicates anumber of total reference pictures as well as one or more otherfactors). The condition can be evaluated (1010) as part of processingfor an RPL modification structure. Or, the condition can be evaluated(1010) as part of processing for a slice header.

Depending on results of the evaluation, the device conditionally signals(1020) in a bitstream a flag that indicates whether an RPL is modified(e.g., replaced, adjusted) according to syntax elements explicitlysignaled in the bitstream. For example, the flag is one ofref_pic_list_modification_flag_(—)10 orref_pic_list_modification_flag_(—)11, and can be conditionally signaledas part of an RPL modification structure of a slice header. Or, afterthe condition is evaluated (1010), depending on the results of theevaluation, the RPL modification structure (including one or more flagsthat indicate whether an RPL is modified according to syntax elementsexplicitly signaled in the bitstream) is conditionally signaled in thebitstream.

The device can repeat the technique (1000) on a slice-by-slice basiswhen RPL modification structure is signaled, or on some other basis.

FIG. 11 shows a generalized technique (1100) for conditional parsing ofan RPL modification flag. A computing device that implements a videodecoder, for example as described with reference to FIG. 4, can performthe technique (1100).

The decoder evaluates (1110) a condition. For example, the conditiondepends at least in part on a variable that indicates a number of totalreference pictures. In some example implementations, the variable isNumPocTotalCurr, and the decoder checks whether the variable is greaterthan 1. Alternatively, the decoder evaluates other and/or additionalconditions. The condition that is evaluated (1110) can include a singlefactor (e.g., value of variable that indicates a number of totalreference pictures), or the condition that is evaluated (1110) caninclude multiple factors (e.g., value of variable that indicates anumber of total reference pictures as well as one or more otherfactors). The condition can be evaluated (1110) as part of processingfor an RPL modification structure. Or, the condition can be evaluated(1110) as part of processing for a slice header.

Depending on results of the evaluation, the device conditionally parses(1120) from a bitstream a flag that indicates whether an RPL is modified(e.g., replaced, adjusted) according to syntax elements explicitlysignaled in the bitstream. For example, the flag is one ofref_pic_list_modification_flag_(—)10 orref_pic_list_modification_flag_(—)11, and can be conditionally signaledas part of an RPL modification structure of a slice header. Or, afterthe condition is evaluated (1110), depending on the results of theevaluation, the RPL modification structure (including one or more flagsthat indicate whether an RPL is modified according to syntax elementsexplicitly signaled in the bitstream) is conditionally parsed from thebitstream.

The device can repeat the technique (1100) on a slice-by-slice basiswhen RPL modification structure is signaled, or on some other basis.

E. Generalized Techniques for Conditional Signaling and Parsing of ListEntries.

FIG. 12 shows a generalized technique (1200) for conditional signalingof list entries for RPL modification. A computing device that implementsa video encoder, for example as described with reference to FIG. 3, canperform the technique (1200).

The device evaluates (1210) a condition. For example, the conditiondepends at least in part on a variable that indicates a number of totalreference pictures (e.g., NumPocTotalCurr in some exampleimplementations). Or, the condition depends at least in part on a numberof active reference pictures for the RPL. Or, the condition depends atleast in part on whether weighted prediction is disabled. Differentlogic can be used to check whether weighted prediction is disableddepending on whether a current slice is a P slice or B slice and/ordepending on which RPL is being signaled/parsed. For example, the logicfor checking the condition for a first RPL (which might be used by a Pslice or B slice) is different than the logic for checking the conditionfor a second RPL (which can be used only by a B slice). Alternatively,the encoder evaluates other and/or additional conditions.

Depending on results of the evaluation, the device conditionally signals(1220) in a bitstream one or more syntax elements for list entries thatindicate how to modify (e.g., replace, adjust) an RPL. For example, thesyntax element(s) for list entries are conditionally signaled as part ofan RPL modification structure of a slice header.

In some example implementations, if (a) the number of total referencepictures is equal to 2 and (b) the number of active reference picturesfor the RPL is equal to 1, then the syntax element(s) for list entriesare absent from the bitstream, and a value is inferred for one of thelist entries. In other example implementations, in addition to thiscondition, if (c) the number of total reference pictures is equal to 2,(d) the number of active reference pictures for the RPL is equal to 2and (e) weighted prediction is disabled, then the one or more syntaxelements for list entries are absent from the bitstream, and values areinferred for two of the list entries.

The device can repeat the technique (1200) on a slice-by-slice basiswhen RPL modification structure is signaled, or on some other basis.

FIG. 13 shows a generalized technique (1300) for conditional parsing oflist entries for RPL modification. A computing device that implements avideo decoder, for example as described with reference to FIG. 4, canperform the technique (1300).

The decoder evaluates (1310) a condition. For example, the conditiondepends at least in part on a variable that indicates a number of totalreference pictures (e.g., NumPocTotalCurr in some exampleimplementations). Or, the condition depends at least in part on a numberof active reference pictures for the RPL. Or, the condition depends atleast in part on whether weighted prediction is disabled. Differentlogic can be used to check whether weighted prediction is disableddepending on whether a current slice is a P slice or B slice and/ordepending on which RPL is being signaled/parsed. Alternatively, thedecoder evaluates other and/or additional conditions.

Depending on results of the evaluation, the device conditionally parses(1320) from a bitstream one or more syntax elements for list entriesthat indicate how to modify (e.g., replace, adjust) an RPL. For example,the syntax element(s) for list entries are conditionally parsed from anRPL modification structure of a slice header.

In some example implementations, if (a) the number of total referencepictures is equal to 2 and (b) the number of active reference picturesfor the RPL is equal to 1, then the syntax element(s) for list entriesare absent from the bitstream, and a value is inferred for one of thelist entries. In other example implementations, in addition to thiscondition, if (c) the number of total reference pictures is equal to 2,(d) the number of active reference pictures for the RPL is equal to 2and (e) weighted prediction is disabled, then the one or more syntaxelements for list entries are absent from the bitstream, and values areinferred for two of the list entries.

The device can repeat the technique (1300) on a slice-by-slice basiswhen RPL modification structure is signaled, or on some other basis.

F. Generalized Techniques for Adjusting Signaling and Parsing of ListEntries.

FIG. 14 shows a generalized technique (1400) for adjusting signaling oflist entries for RPL modification. A computing device that implements avideo encoder, for example as described with reference to FIG. 3, canperform the technique (1400).

The device evaluates (1410) a condition. For example, the conditiondepends at least in part on whether weighted prediction is disabled.Different logic can be used to check whether weighted prediction isdisabled depending on whether a current slice is a P slice or B sliceand/or depending on which RPL is being signaled/parsed. For example, thelogic for checking the condition for a first RPL (which might be used bya P slice or B slice) is different than the logic for checking thecondition for a second RPL (which can be used only by a B slice).Alternatively, the encoder evaluates other and/or additional conditions.

Depending on results of the evaluation, the device adjusts (1420)signaling in a bitstream of one or more syntax elements for list entriesthat indicate how to modify (e.g., replace, adjust) an RPL. Inparticular, length (in bits) of at least one of the syntax element(s) isadjusted. For example, for an index i for the list entries, if weightedprediction is disabled, the length (in bits) of the at least one of thesyntax elements decreases as i increases. In some exampleimplementations, if weighted prediction is disabled, the length of agiven syntax element for list entry[i] is Ceil(Log2(NumPocTotalCurr-i)). Otherwise (weighted prediction is enabled), thelength of the given syntax element for list entry[i] is Ceil(Log2(NumPocTotalCurr)) bits.

The device can repeat the technique (1400) on a slice-by-slice basiswhen RPL modification structure is signaled, or on some other basis.

FIG. 15 shows a generalized technique (1500) for adjusting parsing oflist entries for RPL modification. A computing device that implements avideo decoder, for example as described with reference to FIG. 4, canperform the technique (1500).

The decoder evaluates (1510) a condition. For example, the conditiondepends at least in part on whether weighted prediction is disabled.Different logic can be used to check whether weighted prediction isdisabled depending on whether a current slice is a P slice or B sliceand/or depending on which RPL is being signaled/parsed. Alternatively,the decoder evaluates other and/or additional conditions.

Depending on results of the evaluation, the device adjusts (1520)parsing from a bitstream of one or more syntax elements for list entriesthat indicate how to modify (e.g., replace, adjust) an RPL. Inparticular, length (in bits) of at least one of the syntax element(s) isadjusted. For example, for an index i for the list entries, if weightedprediction is disabled, the length (in bits) of the at least one of thesyntax elements decreases as i increases. In some exampleimplementations, if weighted prediction is disabled, the length of agiven syntax element for list entry[i] is Ceil(Log2(NumPocTotalCurr-i)). Otherwise (weighted prediction is enabled), thelength of the given syntax element for list entry[i] is Ceil(Log2(NumPocTotalCurr)) bits.

The device can repeat the technique (1500) on a slice-by-slice basiswhen RPL modification structure is signaled, or on some other basis.

G. Alternatives.

FIGS. 7 a, 7 b, 10 and 11 illustrate conditional signaling and parsingof a flag such as ref_pic_list_modification_flag_(—)10 orref_pic_list_modification_flag_(—)11 based on a condition. In this way,the signaling of additional RPL modification information (such as syntaxelements for list entries) is controlled. As explained with reference toFIG. 7 a, the signaling and parsing of an RPL modification flag can becontrolled by evaluating the condition as part of aref_pic_lists_modification( ) structure. Alternatively, as explainedwith reference to FIG. 7 b, the signaling and parsing of the RPLmodification structure (e.g., ref_pic_lists_modification( ) structure)can be controlled by evaluating the same condition as part of sliceheader processing or otherwise. For example, if the variableNumPocTotalCurr is greater than 1, then the ref_pic_lists_modification() structure is signaled. Otherwise (the variable NumPocTotalCurr is notgreater than 1), the ref_pic_lists_modification( ) structure is notsignaled, and the values of list entries are inferred as described abovewith reference to FIG. 7 a. Extending FIG. 10, after the condition isevaluated, depending on results of the evaluation, the RPL modificationsyntax structure is conditionally signaled. Extending FIG. 11, after thecondition is evaluated, depending on results of the evaluation, the RPLmodification syntax structure is conditionally parsed.

For the sake of illustration, the detailed description includes variousexamples with specific names for some parameters and variables. Theinnovations described herein are not limited to implementations withparameters or variables having such names. Instead, the innovationsdescribed herein can be implemented with various types of parameters andvariables.

H. Additional Innovative Features.

In addition to the claims, innovative features described herein include,but are not limited to, the features shown in the following table.

# Feature A. Conditional Signaling of Syntax Elements for List Entriesof an RPL A1 A method performed by a video encoder, comprising:evaluating a condition; and depending on results of the evaluating,conditionally signaling in a bitstream one or more syntax elements forlist entries that indicate how to modify an RPL. A2 A method performedby a video decoder, comprising: evaluating a condition; and depending onresults of the evaluating, conditionally parsing from a bitstream one ormore syntax elements for list entries that indicate how to modify anRPL. A3 The method of feature A1 or A2 wherein the condition depends atleast in part on a variable that indicates a number of total referencepictures. A4 The method of feature A3 wherein the variable isNumPocTotalCurr. A5 The method of any one of features A1-A4 wherein thecondition depends at least in part on a number of active referencepictures for the RPL. A6 The method of any one of features A1-A5 whereinthe condition depends at least in part on whether weighted prediction isdisabled. A7 The method of feature A6 wherein different logic is used tocheck whether weighted prediction is disabled depending on whether acurrent slice is a P slice or B slice and/or depending on which RPL isbeing signaled/parsed. A8 The method of feature A1 or A2 wherein thecondition depends at least in part on whether (a) a number of totalreference pictures is equal to 2 and (b) a number of active referencepictures for the RPL is equal to 1. A9 The method of feature A8 wherein,if (a) the number of total reference pictures is equal to 2 and (b) thenumber of active reference pictures for the RPL is equal to 1, then theone or more syntax elements for list entries are absent from thebitstream, and a value is inferred for one of the list entries. A10 Themethod of feature A1 or A2 wherein the condition depends at least inpart on whether (c) the number of total reference pictures is equal to2, (d) the number of active reference pictures for the RPL is equal to 2and (e) weighted prediction is disabled. A11 The method of feature A10wherein if (c) the number of total reference pictures is equal to 2, (d)the number of active reference pictures for the RPL is equal to 2 and(e) weighted prediction is disabled, then the one or more syntaxelements for list entries are absent from the bitstream, and values areinferred for two of the list entries. A12 The method of any one offeatures A1-A11 wherein the one or more syntax elements for list entriesare conditionally signaled as part of an RPL modification structure of aslice header. A13 The method of any one of features A1-A12 wherein theRPL is an RPL 0 associated with a P slice. A14 The method of any one offeatures A1-A12 further comprising repeating the evaluating and theconditional signaling or parsing for each of multiple RPLs, wherein themultiple RPLs include an RPL 0 and RPL 1 associated with a B slice. A15A computing device adapted to perform the method of any one of featuresA1-A14. A16 A tangible computer-readable media storingcomputer-executable instructions for causing a computing device toperform the method of any one of features A1-14. B. Adjusting Length ofSyntax Elements for List Entries of an RPL B1 A method performed by avideo encoder, comprising: evaluating a condition; and depending onresults of the evaluating, adjusting signaling in a bitstream of one ormore syntax elements for list entries that indicate how to modify anRPL, wherein length of at least one of the one or more syntax elementsis adjusted. B2 A method performed by a video decoder, comprising:evaluating a condition; and depending on results of the evaluating,adjusting parsing from a bitstream of one or more syntax elements forlist entries that indicate how to modify an RPL, wherein length of atleast one of the one or more syntax elements is adjusted. B3 The methodof feature B1 or B2 wherein the condition depends at least in part onwhether weighted prediction is disabled. B4 The method of feature B3wherein different logic is used to check whether weighted prediction isdisabled depending on whether a current slice is a P slice or B sliceand/or depending on which RPL is being signaled/parsed. B5 The method offeature B3 wherein for an index i for the list entries, if weightedprediction is disabled, the length of the at least one of the syntaxelements decreases as i increases. B6 The method of feature B3 wherein,for an index i for the list entries: if weighted prediction is disabled,the length of a given syntax element for list entry[i] isCeil(Log2(NumPocTotalCurr−i)) bits; and if weighted prediction isenabled, the length of the given syntax element for list entry[i] isCeil(Log2(NumPocTotalCurr)) bits. B7 A computing device adapted toperform the method of any one of features B1-B6. B8 A tangiblecomputer-readable media storing computer-executable instructions forcausing a computing device to perform the method of any one of featuresB1-B6. C. General C1 A method performed by an encoder, comprising:encoding video; and outputting at least part of a bitstream includingthe encoded video, including signaling RPL information according to oneof the innovations described herein. C2 A method performed by a decoder,comprising: receiving at least part of a bitstream including encodedvideo, including parsing RPL information signaled according to one ofthe innovations described herein; and decoding the encoded video. C3 Acomputing device adapted to perform the method of feature C1 or C2. C4 Atangible computer-readable media storing computer-executableinstructions for causing a computing device to perform the method offeature C1 or C2.

In view of the many possible embodiments to which the principles of thedisclosed invention may be applied, it should be recognized that theillustrated embodiments are only preferred examples of the invention andshould not be taken as limiting the scope of the invention. Rather, thescope of the invention is defined by the following claims. We thereforeclaim as our invention all that comes within the scope and spirit ofthese claims.

We claim:
 1. A computing device that implements a video encoder, wherein the computing device is adapted to perform a method comprising: evaluating a condition, wherein the condition depends at least in part on a variable that indicates a number of total reference pictures; and depending on results of the evaluating, conditionally signaling in a bitstream a flag that indicates whether a reference picture list (“RPL”) is modified according to syntax elements explicitly signaled in the bitstream.
 2. The computing device of claim 1 wherein the condition depends on whether value of the variable is greater than
 1. 3. The computing device of claim 1 wherein the flag is a first flag that indicates whether a first RPL is modified according to syntax elements explicitly signaled in the bitstream or a second flag that indicates whether a second RPL is modified according to syntax elements explicitly signaled in the bitstream.
 4. The computing device of claim 1 wherein the flag is conditionally signaled as part of an RPL modification structure of a slice header.
 5. The computing device of claim 1 wherein the condition is evaluated as part of processing for an RPL modification structure that includes the flag.
 6. The computing device of claim 1 wherein the condition is evaluated as part of processing for a slice header.
 7. The computing device of claim 6 wherein an RPL modification structure including the flag is conditionally signaled depending on results of the evaluation.
 8. The computing device of claim 1 wherein, during encoding, the video encoder evaluates (a) results of motion compensation for which the RPL is modified according to syntax elements explicitly signaled in the bitstream and (b) results of motion compensation for which the RPL is not modified according to syntax elements explicitly signaled in the bitstream, and modifies the RPL so as to reorder one or more reference pictures for more efficient addressing with reference indices.
 9. The computing device of claim 1 wherein, during encoding, the video encoder evaluates (a) results of motion compensation for which the RPL is modified according to syntax elements explicitly signaled in the bitstream and (b) results of motion compensation for which the RPL is not modified according to syntax elements explicitly signaled in the bitstream, and modifies the RPL so as to remove one or more reference pictures based at least in part on frequency of use during encoding.
 10. The computing device of claim 1 wherein, during encoding, the video encoder evaluates (a) results of motion compensation for which the RPL is modified according to syntax elements explicitly signaled in the bitstream and (b) results of motion compensation for which the RPL is not modified according to syntax elements explicitly signaled in the bitstream, and modifies the RPL so as to add one or more reference pictures based at least in part on frequency of use during encoding.
 11. A method performed by a video decoder, comprising: evaluating a condition, wherein the condition depends at least in part on a variable that indicates a number of total reference pictures; and depending on results of the evaluating, conditionally parsing from a bitstream a flag that indicates whether a reference picture list (“RPL”) is modified according to syntax elements explicitly signaled in the bitstream.
 12. The method of claim 11 wherein the condition depends on whether value of the variable is greater than
 1. 13. The method of claim 11 wherein the flag is a first flag that indicates whether a first RPL is modified according to syntax elements explicitly signaled in the bitstream or a second flag that indicates whether a second RPL is modified according to syntax elements explicitly signaled in the bitstream.
 14. The method of claim 11 wherein the flag is conditionally parsed from an RPL modification structure of a slice header.
 15. The method of claim 11 wherein the condition is evaluated as part of processing for an RPL modification structure that includes the flag.
 16. The method of claim 11 wherein the condition is evaluated as part of processing for a slice header.
 17. The method of claim 16 wherein an RPL modification structure including the flag is conditionally parsed depending on results of the evaluation.
 18. One or more computer-readable media having stored thereon computer-executable instructions for causing a processing unit programmed thereby to perform a method comprising: evaluating a condition as part of processing for a slice header, wherein the condition depends at least in part on a variable that indicates a number of total reference pictures; and depending on results of the evaluating, conditionally signaling in or parsing from a bitstream a reference picture list (“RPL”) modification structure of the slice header, wherein the RPL modification structure includes a flag that indicates whether an RPL is modified according to syntax elements explicitly signaled in the bitstream.
 19. The one or more computer-readable media of claim 18 wherein the condition depends on whether value of the variable is greater than
 1. 20. The one of more computer-readable media of claim 18 wherein the method further comprises, during encoding: evaluating results of motion compensation for which the RPL is modified according to syntax elements explicitly signaled in the bitstream; evaluating results of motion compensation for which the RPL is not modified according to syntax elements explicitly signaled in the bitstream; and modifying the RPL so as to accomplish one or more of (a) reordering one or more reference pictures for more efficient addressing with reference indices, (b) removing one or more reference pictures based at least in part on frequency of use during encoding, and (c) adding one or more reference pictures based at least in part on frequency of use during encoding. 